EOG Signal Classification with Wavelet and Supervised Learning Algorithms KNN, SVM and DT

The work carried out in this paper consists of the classification of the physiological signal generated by eye movement called Electrooculography (EOG). The human eye performs simultaneous movements, when focusing on an object, generating a potential change in origin between the retinal epithelium and the cornea and modeling the eyeball as a dipole with a positive and negative hemisphere. Supervised learning algorithms were implemented to classify five eye movements; left, right, down, up and blink. Wavelet Transform was used to obtain information in the frequency domain characterizing the EOG signal with a bandwidth of 0.5 to 50 Hz; training results were obtained with the implementation of K-Nearest Neighbor (KNN) 69.4%, a Support Vector Machine (SVM) of 76.9% and Decision Tree (DT) 60.5%, checking the accuracy through the Jaccard index and other metrics such as the confusion matrix and ROC (Receiver Operating Characteristic) curve. As a result, the best classifier for this application was the SVM with Jaccard Index.


Introduction
The ocular muscles produce an electrical potential difference with an origin between the corneal pigment epithelium and the retina [1]. This differential is known in [2] as the Electrooculography (EOG) signal. EOG is obtained using silver electrodes placed superficially on the face, registering the horizontal channel (left-right movements) and the vertical channel (up and down movements).
The use of EOG in [3][4][5][6][7] the literature reviewed shows mostly its application by users with motor disabilities, turning their eyes to communicate. Therefore, the monitoring of biological signals such as EOG allows the integration of everyday objects, as mentioned in [8], where writing is performed by selecting a group of limited words for the response of short sentences; another application handled in [9] has is the electrical control of a wheelchair by eye movements; as well as in [10], the recognition of eye movement by different parameters detected in the signal, when visualizing different abstract images; the movement of the mouse cursor when receiving a signal from both eyes is described in [11], and the directional control of a robot in [12] were designed by a method based on saccadic movements and eye reflexes that were obtained as the average speed, maximum speed and voltage range in the developed model and did not include the fixed gaze and blinking movement.
EOG signal parameters are mostly detected at low frequencies, in a bandwidth of 0.5 to 50 Hz. The eye movement classification is based on algorithms that implement the calculation of the signal derivative; each algorithm targets different parameters (average  Signal recording process that was saved in a file with extension .mat that served as an input to the classification algorithms; the process started with the placement of electrodes next to the acquisition system, followed by calibration tests, obtaining the signal and ending with the storage of a data file.
To validate the performance of each classifier, the EOG signal was divided into Horizontal and Vertical, positive action potentials corresponding to the Right/Horizontal movement ( Figure 3a   Signal recording process that was saved in a file with extension .mat that served as an input to the classification algorithms; the process started with the placement of electrodes next to the acquisition system, followed by calibration tests, obtaining the signal and ending with the storage of a data file.
To validate the performance of each classifier, the EOG signal was divided into Horizontal and Vertical, positive action potentials corresponding to the Right/Horizontal movement ( Figure 3a  Signal recording process that was saved in a file with extension .mat that served as an input to the classification algorithms; the process started with the placement of electrodes next to the acquisition system, followed by calibration tests, obtaining the signal and ending with the storage of a data file.
To validate the performance of each classifier, the EOG signal was divided into Horizontal and Vertical, positive action potentials corresponding to the Right/Horizontal movement ( Figure 3a Table 1 shows the range from the beginning to the end of each eye movement, starting with data 0 to 32.500 of the total samples. Once a range of eye movements was obtained, the processing was performed with the use of the Wavelet transform, which is explained in Section 2.2, where dimensionality reduction with the transform is explained, as well as its selection with the entropy method. This is followed by Section 2.3, starting with the metrics that were calculated from the general signal for the input to the classifier.

Wavelet
The frequency spectrum of the EOG signal was analyzed, the mathematical tools implemented were Fourier and Wavelet, and the EOG signal had a dynamic behavior, i.e., it varied in time and frequency. By implementing Fourier, the harmonics of the energy spectrum of the signal reduced the information; thus, Wavelet Transform [23,24] was implemented to analyze the different levels of signal frequency obtained in tests of the different movements of eyeballs.
Wavelets are functions used to approximate data with variations, transient and nonstationary phenomena. In the implemented algorithm, data are processed at different  Table 1 shows the range from the beginning to the end of each eye movement, starting with data 0 to 32.500 of the total samples. Once a range of eye movements was obtained, the processing was performed with the use of the Wavelet transform, which is explained in Section 2.2, where dimensionality reduction with the transform is explained, as well as its selection with the entropy method. This is followed by Section 2.3, starting with the metrics that were calculated from the general signal for the input to the classifier.

Wavelet
The frequency spectrum of the EOG signal was analyzed, the mathematical tools implemented were Fourier and Wavelet, and the EOG signal had a dynamic behavior, i.e., it varied in time and frequency. By implementing Fourier, the harmonics of the energy spectrum of the signal reduced the information; thus, Wavelet Transform [23,24] was implemented to analyze the different levels of signal frequency obtained in tests of the different movements of eyeballs.
Wavelets are functions used to approximate data with variations, transient and nonstationary phenomena. In the implemented algorithm, data are processed at different resolutions if a signal or function is observed using a wide "data window". Waveforms are not observed, and such windows are automatically adjusted when changing resolution. Wavelet analysis consists of three steps: decomposition, thresholding and reconstruction.
The continuous wavelet transform (CWT) can be defined as the sum of all scaled and shifted components of the Mother Wavelet's overall time, as shown in Equation (1).
where a is the scale, t is time, b is the displacement and x(t) is the input function and ψ indicates the wavelet function. The above equation is scaled by 'a' and then translated to a second scalar as 'b'; x represents the EOG signal as a function of time t. Wavelet has a set of families which have members with different parameters for its calculation, and there are two ways to select a Wavelet. The first is to search among the different wavelet families, which have a similar shape to the signal; the second is based on testing with the different wavelets to obtain a smoothing in the signal without losing the points of interest in the original. When a wavelet is selected, it is called the Mother Wavelet, which is represented by Equation (1).
The wavelet detail coefficients indicate the relationship between the signal and the Mother Wavelet, and this ratio allows us to know the frequency components of the signal.
Mother Wavelet was determined using the entropy method; it provides levels [25] that define the amount of disorder of a data set, i.e., the result after its application on the signal, which indicates that the farther it is from the original state, the greater the amount of disorder; therefore, this alters the correlation between the processed signal and the original one.
Thus, this demonstrated that the levels of detail coefficients lost a significant amount of information, indicating that it had a higher level of entropy. The calculation is presented in Equation (2). In [26], methods for obtaining optimization through the Bayesian method of hyperparameters for the classification of stationary signals are presented; however, in this work, it was indicated that with entropy, it was feasible to obtain optimal results of the hyperparameters for classification with supervised algorithms.
where S represents the data set in which the entropy is calculated, p(c) is the portion of data points that belong to class c, and to the total number of data points in the set.
The application of the Mother Wavelet of the EOG signal is shown in Figure 6, where the signal has a different shape from the original; this is given by the detailed coefficients that were applied and allowed important data to be obtained without loss of information. By obtaining the Wavelet detail level data, we obtain a new signal with the most relevant information of the original one. The results obtained in the frequency spectrum were confirmed in windows of four data; obtaining 508 windows of each EOG channel, the calculation of nine metrics was performed, as shown in Table 2.  By obtaining the Wavelet detail level data, we obtain a new signal with the most relevant information of the original one. The results obtained in the frequency spectrum were confirmed in windows of four data; obtaining 508 windows of each EOG channel, the calculation of nine metrics was performed, as shown in Table 2. Table 2. Frequency domain EOG signal metrics.

Metric Definition
Root Mean Square (RMS) Continuous power without distorting the original signal.

AMP
Amplifies when creating an amplifier object with default property value. Maximum Maximum data of the frequency group.
Variance A measure of dispersion that represents the variability of a data series with respect to its mean.
Covariance Value reflecting the amount by which any variables vary jointly with respect to their arithmetic means.

Median
Central variable of a data set.

Average
Result obtained by adding several quantities of the amount of data. Pspectrum Returns the scale of the frequency spectrum.

Power
Sum of the absolute squares of its time domain samples divided by the length of the signal.
Descriptions of each metric that were applied to the signal to obtain data for the classification algorithm.

Classification Algorithms
The implementation of a classification algorithm is one of the requirements to identify offline classes whose membership is known based on training. In this study, offline classification was implemented using three supervised learning algorithms: SVM, KNN and DT.
To compare the best classification results, the classes were labeled (Table 3) to input them into the different algorithms. Table 3. Labeling of eye movements with their assigned class for input into the classification algorithm. Each movement was assigned a class starting from 0 to class 4.

K-Nearest Neighbors Algorithm (KNN)
K-Nearest Neighbor (KNN) is a supervised learning algorithm that uses the proximity of distances for classification by a majority vote, assigning the case to the most common class among its nearest neighbors (K), which is measured by a distance function that then uses the Euclidean equation, as shown in Equation (3).
where x corresponds to the query point (K) and y to the nearest neighbor to determine which neighbor is the nearest. Application of equation in Algorithm 1 KNN.
The label with the most representative in the set of K neighbors is chosen end for Ensure: Determine the accuracy and the best neighbor 2.5. Support Vector Machine Algorithm (SVM) Support Vector Machines are a supervised learning classifier that works by correlating data in a feature space so that data points can be categorized, even if they are not linearly separated. The features of the new data can be used to define the group to which the new record belongs. To allow some flexibility, the algorithm handles a parameter, C, which controls the trade-off between training errors and rigid margins, thus creating a margin that allows for some errors in the classification, i.e., gives control over the classification errors. The mathematical function used for the transformation is known as a kernel. The polynomial equation shown in Equation (4) was used.
where K(xi,xj) corresponds to the matrix of n × n kernel elements, xi,xj corresponds to the feature hyperplane, {(yx}_iˆTx_j+r) d support vectors act as a separation between classes and represents the data for the measurement and r is the parameter that is being adjusted or calibrated. Application of equation in Algorithm 2 SVM.

Algorithm 2 SVM
Require: Determine the sample of set data (80% training and 20% testing) Ensure: Determine the accuracy Select kernel Select the optimal value of the cost and gamma for SVM while (topping condition is not met) K(xi, xj) = (yx T i x j + r) d , y > 0 do Implement SVM train step for each data point Implement SVM classification for testing data points end while accuracy

Decision Tree Algorithm (DT)
The decision tree algorithm is a machine learning-based algorithm for classification, where an internal node represents a feature, the branch represents a decision rule, and each leaf node represents the result. The top node in a decision tree is known as the root node. It performs a partitioning from the attribute value function and splits the tree in a recursive manner called recursive partitioning. Its structure helps to make decisions from each training set containing labels of each class and predictor variables that can be inspected for a decision or split, which results in a left node and a right node. This starts from the root of the tree and ends at the endpoint in the form of a leaf node giving an output class. Each partition is performed with the clustering of the Gini index, which is presented in Equation (5). Application of equation in Algorithm 3 DT. where 1 − ∑ i p 2 (i) corresponds to the subnode calculation, p 2 to the sum of the probability squares and i to the data.

Algorithm 3 DT
Require: Determine the sample of set data (80% training and 20% testing). Each data are analyzed as the root node of the decision tree is assigned. Each member is assigned a child node. Each of the members of the tree is analyzed and a label is assigned gdi = 1 − ∑ i p 2 (i) Predictions are made based on the result of each child with the labeling of each member. accuracy

Jaccard Index
The well-known Jaccard similarity algorithm is an algorithm designed to measure similarities between sample sets. There is function-based analysis, which is typically used to study the resemblance of small numbers of sets and, additionally, the analysis of large data sets, calculated by Equation (6).
where a is the data in group A, b is the data in group B and c is the number of elements present in both groups A and B.

Methodology
The simulations and results were run on a laptop computer with the following computer characteristics: AMD Ryzen 5 3450U with Radeon Vega Mobile Gfx processor with a processor speed of 2.10 to 3.5 GHz and 4 MB processor cache, (2 × 8 dual channel) 16 GB of DDR4 memory at 3000 MHz, a 256 GB Crucial SSD and a video card AMD Radeon 73 graphics card, and the operating system Windows 11 Home Single Language version 22H2 64-bit.
Matlab 2022 was used for signal processing; Python language was used for coding the algorithms as well as the metrics in an online environment. This was implemented on Google Research's Colaboratory, which allows the execution of different programming languages.
The Train-Test Split method of the Cross-Validation technique was used, which consists of randomly decomposing the data series; this method was very accurate since we evaluated the combinations of training and test data, and the number of iterations depended on the size of the data set; usually, 80% of the data were reserved to be used for training the Machine Learning model. The remaining 20% of the data allowed for testing the algorithm for validation, as applied in this research.
A diagram in Figure 7 shows the process followed by the EOG signal with the implemented techniques. Google Research's Colaboratory, which allows the execution of different programming languages. The Train-Test Split method of the Cross-Validation technique was used, which consists of randomly decomposing the data series; this method was very accurate since we evaluated the combinations of training and test data, and the number of iterations depended on the size of the data set; usually, 80% of the data were reserved to be used for training the Machine Learning model. The remaining 20% of the data allowed for testing the algorithm for validation, as applied in this research.
A diagram in Figure 7 shows the process followed by the EOG signal with the implemented techniques.

Results and Discussion
The results obtained from the eye movements in different classification algorithms were analyzed to be defined based on sensitivity and specificity. This, by using the metrics, represented the percentage that corresponded to the most classified values.
We applied the Entropy method to the data obtained from the transform to determine the Wavelet to apply or mother, as shown in the Vertical EOG in Table 4 and Horizontal

Results and Discussion
The results obtained from the eye movements in different classification algorithms were analyzed to be defined based on sensitivity and specificity. This, by using the metrics, represented the percentage that corresponded to the most classified values.
We applied the Entropy method to the data obtained from the transform to determine the Wavelet to apply or mother, as shown in the Vertical EOG in Table 4 and Horizontal EOG in Table 5. We analyzed the family wavelet Haar(haar), Coiflets (coif), Symelets (sym), Fejer Korovkin Filters (fk), Discrete Meyer (meyr), Biorthogonal (bio) and Reverse Biorthogonal (rbio), which were the ones that showed similarity with the original signal.  Table 4 shows the results in the rbio family of the Wavelets families.
With the data obtained, it was determined that Wavelet Reverse Biorthogonal would be used in the signal, becoming the Wavelet Mother. Derived from the fact that after the application of entropy, it showed results with lower amounts of information disorder, this family contained members with characteristics that allowed a signal with a level of smoothing to be obtained after its application without losing important data of the original signal. Entropy was applied to each of the 14 members of the Reverse Biorthogonal (rbio) family, showing that member 3.1 was one of those with the least amount of entropy; it this selected as the Wavelet Mother by visual comparison with the other members, showing its five levels of detail coefficient in both EOG channels, which allowed significant data to be obtained from the original signal. Each result is shown in Table 6 for the Vertical EOG channel and in Table 7 for the Horizontal EOG. Entropy was applied to each of the 14 members of the Reverse Biorthogonal family, and the result with the least amount of entropy was shown in member 3.1. Each result is displayed in Table 6 for the Vertical EOG channel and Table 7 for Horizontal EOG.
The Bior 3.1 family member in the Vertical channel is shown as marked. Each level of detail coefficient allows the signal to be viewed by segments, finding characteristic points in the parameters. Five levels were analyzed, of which level 4 showed a level of smoothing in the signal, allowing the beginning and end of each of the EOG movements to be found.

Confusion Matrix
The confusion matrix was applied to the results of both EOG channels, and the results are shown visually in Figure 8 below.
In the Table 8 shows each of the positive values for the Horizontal EOG channel; negative values, false positives and false negatives for each of the characteristics, these data are important for the calculation of the confusion matrix terms.  Table 9 shows each of the positive values for the Vertical EOG channel, including negative values, false positives, and false negatives for each of the characteristics; these data were important for the calculation of the confusion matrix terms. 023, 23, x FOR PEER REVIEW 12 of 18 a level of smoothing in the signal, allowing the beginning and end of each of the EOG movements to be found.

Confusion Matrix
The confusion matrix was applied to the results of both EOG channels, and the results are shown visually in Figure 8 below.  (Table 7) and Horizontal EOG (Table 8), each number from 0 to 17 represents the 18 labels where the name of the feature is indicated: the one that corresponds to each channel and its location number in the matrix.
In the Table 8 shows each of the positive values for the Horizontal EOG channel; negative values, false positives and false negatives for each of the characteristics, these data are important for the calculation of the confusion matrix terms.  Table 9 shows each of the positive values for the Vertical EOG channel, including negative values, false positives, and false negatives for each of the characteristics; these data were important for the calculation of the confusion matrix terms. Table 9. Parameters and characteristics of the EOG Signal Vertical channel.  (Table 7) and Horizontal EOG (Table 8), each number from 0 to 17 represents the 18 labels where the name of the feature is indicated: the one that corresponds to each channel and its location number in the matrix. The results of the aforementioned calculations are shown in Table 10 for the Horizontal EOG channel and Table 11 for the Vertical EOG channel, each corresponding to the result of the characteristic applied to the signal.
The calculation of the data of the confusion matrix in the Horizontal channel is shown. The results of the calculation for sensitivity, specificity, accuracy, and precision in the Vertical EOG channel are given.

ROC Curve
The results obtained from the ROC curve in the KNN algorithm are presented in Table 12, SVM in Table 13, and DT in Table 14; they show the obtained results of sensitivity and specificity after the input of the complete signal to the classifier to indicate the difference between each term with each result obtained indicating a higher sensitivity index in SVM, checking with the calibration given in this algorithm.

Cutting Point
Sensitivity Specificity 0.5 50% 50% Table 13. Sensitivity and specificity of SVM algorithm of the best classification.

Cutting Point Sensitivity Specificity
0.5 50% 50% These results obtained from the ROC Curve application of KNN, SVM and DT algorithms were obtained and are explained below in Figures 9-11 respectively. dex in SVM, checking with the calibration given in this algorithm.
These results obtained from the ROC Curve application of KNN, SVM and DT algorithms were obtained and are explained below in Figure 9, Figure 10 and Figure 12 respectively. (a) KNN Figure 9. KNN algorithm. Two combinations of neighbors (K) were used, with K = 4 and K = 10, and the nearest neighbor was K = 1. The algorithm training percentage of 80% and 20% testing and with seed = 4. As the signal changed, the ROC curve took a sudden change; this was derived from the fact that the first two movements presented similarities in the voltage amplitude (look up and look down). The eye movement transition was noticed when the blinking movement was performed followed by the left and right movements, derived from the fact that the voltage had a significant range of change at the first movements; this was noticed in the graph when resuming the classification of true positives. For the KNN algorithm it is problematic to have a classification when the signal presents similar data and when it is a large volume. It was also seen that the higher the volume of data, the further away the correct classification due to the number of calculations between distances.

Cutting Point
Sensitivity Specificity 0.5 50% 50% The cutoff point was obtained, and it was visualized as the ROC line changed direction, derived from the variation in the signal data; however, the ROC curve is shown with no success in its classification in the first half of the data, resuming an improvement in the classification in the rest of the signal. (b) SVM Figure 9. KNN algorithm. Two combinations of neighbors (K) were used, with K = 4 and K = 10, and the nearest neighbor was K = 1. The algorithm training percentage of 80% and 20% testing and with seed = 4. As the signal changed, the ROC curve took a sudden change; this was derived from the fact that the first two movements presented similarities in the voltage amplitude (look up and look down). The eye movement transition was noticed when the blinking movement was performed followed by the left and right movements, derived from the fact that the voltage had a significant range of change at the first movements; this was noticed in the graph when resuming the classification of true positives. For the KNN algorithm it is problematic to have a classification when the signal presents similar data and when it is a large volume. It was also seen that the higher the volume of data, the further away the correct classification due to the number of calculations between distances. By changing the value of C, the hyperplane of the Kernel was modified. A calibration process was performed (Figure 11), where the value was adjusted, and by choosing a value close to 0, it became closer to some points than others; basically, there was no restriction, and we ended up with a hyperplane that did not classify anything. Since the data were linearly separable, a large C could be used, but this may have been an outlier, and that is why we used a hyperplane very close to the margin with no outlier. We tried several values, and we can say that the selected one provided the freedom to our classifier. The data show a change in the cutoff point; throughout the signal, they were shown above the diagonal that divides the ROC space. The points above the diagonal represent good classification results; they became better as the signal classification progressed.
(c) DT Figure 12. DT algorithm. Implementation of 10 nodes. With a training percentage of 80% and 20% of test data with a seed or random state equal to 4. Table 14 shows the cut-off point sensitivity and specificity of the ROC curve result of the DT algorithm. Inconsistency was shown in the classification, where no improvement was evident at any point in the signal, indicating that this was an inefficient algorithm for classification.

Jaccard Index
In the Table 15 shows the results of the signal classified with the three classification algorithms with the application of the Jaccard metric, as explained above. (a) KNN The cutoff point was obtained, and it was visualized as the ROC line changed direction, derived from the variation in the signal data; however, the ROC curve is shown with no success in its classification in the first half of the data, resuming an improvement in the classification in the rest of the signal.

(b) SVM
By changing the value of C, the hyperplane of the Kernel was modified. A calibration process was performed (Figure 12), where the value was adjusted, and by choosing a value close to 0, it became closer to some points than others; basically, there was no restriction, and we ended up with a hyperplane that did not classify anything. Since the data were linearly separable, a large C could be used, but this may have been an outlier, and that is why we used a hyperplane very close to the margin with no outlier. We tried several values, and we can say that the selected one provided the freedom to our classifier. By changing the value of C, the hyperplane of the Kernel was modified. A calibration process was performed (Figure 11), where the value was adjusted, and by choosing a value close to 0, it became closer to some points than others; basically, there was no restriction, and we ended up with a hyperplane that did not classify anything. Since the data were linearly separable, a large C could be used, but this may have been an outlier, and that is why we used a hyperplane very close to the margin with no outlier. We tried several values, and we can say that the selected one provided the freedom to our classifier. Cross-validation was used with the total data, where 80% was used for training and the remaining 20% for testing; in this graph, from the calibration in C, the qualification improved. Cross-validation was used with the total data, where 80% was used for training and the remaining 20% for testing; in this graph, from the calibration in C, the qualification improved.
The sensitivity and specificity (Table 14) of the SVM algorithm were calculated with the results obtained with the classification.
The data show a change in the cutoff point; throughout the signal, they were shown above the diagonal that divides the ROC space. The points above the diagonal represent good classification results; they became better as the signal classification progressed.
(c) DT Table 14 shows the cut-off point sensitivity and specificity of the ROC curve result of the DT algorithm.
Inconsistency was shown in the classification, where no improvement was evident at any point in the signal, indicating that this was an inefficient algorithm for classification.

Jaccard Index
In the Table 15 shows the results of the signal classified with the three classification algorithms with the application of the Jaccard metric, as explained above. In the analysis of the classification results, SVM obtained the best result compared to the other algorithms in the test column, with 76.9%.
The selection of the KNN, SVM and DT algorithms was derived from issues of explainability since the article focused on the health area.

Conclusions
Performing routine activities for people with motor disabilities is a problem that impacts their quality of life. For this reason, the research presented in this paper is about the acquisition of two EOG channels that allows data to be acquired from different eye movements, with the help of the implementation of the Wavelet Reverse Biorthogonal 3. 1 to identify the different waveforms of the signal through acquisition windows; this process improved the responses of supervised classifiers KNN, SVM and DT and through the Jaccard index metric the efficiency level of each algorithm was checked. The best result, with a value of 76.9%, was obtained for the SVM classifier in the Jaccard Index metric; according to the state-of-the-art reported, it exceeded the percentage of response in the efficiency of supervised classifiers with values of 69.75% reported in [13]. This translated into better data classification. The program codes and methods implemented in this research are provided at: https://acortar.link/nW8l0s (accessed on 4 May 2023).
For future work, we propose the use of these classifiers by implementing them in different tools, such as a human-machine interface to support assistance and interaction with different users to apply it in the medical area, reducing the response time and the learning curve of inexperienced users. Institutional Review Board Statement: No ethical review and approval were required for the study on human participants in accordance with local legislation and institutional requirements. The participants provided your written informed consent to participate in this study. Written informed consent was obtained from the person for the publication of potentially identifiable images or data included in this article.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study and written informed consent has been obtained from the patient to publish this paper.
Data Availability Statement: Information on this research can be obtained from the following link https://acortar.link/nW8l0s (accessed on 4 May 2023).

Conflicts of Interest:
The authors declare no conflict of interest.